Search CORE

117 research outputs found

Analyzing the Amazon Mechanical Turk Marketplace

Author: Ipeirotis Panagiotis G.
Publication venue
Publication date: 11/09/2010
Field of study

Since the concept of crowdsourcing is relatively new, many potential participants have questions about the AMT marketplace. For example, a common set of questions that pop up in an 'introduction to crowdsourcing and AMT' session are the following: What type of tasks can be completed in the marketplace? How much does it cost? How fast can I get results back? How big is the AMT marketplace? The answers for these questions remain largely anecdotal and based on personal observations and experiences. To understand better what types of tasks are being completed today using crowdsourcing techniques, we started collecting data about the AMT marketplace. We present a preliminary analysis of the dataset and provide directions for interesting future research

New York University Faculty Digital Archive

Modeling Dependency in Prediction Markets

Author: Archak Nikolay
Ipeirotis Panagiotis G.
Publication venue
Publication date: 01/12/2010
Field of study

In the last decade, prediction markets became popular forecasting tools in areas ranging from election results to movie revenues and Oscar nominations. One of the features that make prediction markets particularly attractive for decision support applications is that they can be used to answer what-if questions and estimate probabilities of complex events. Traditional approach to answering such questions involves running a combinatorial prediction market, what is not always possible. In this paper, we present an alternative, statistical approach to pricing complex claims, which is based on analyzing co-movements of prediction market prices for basis events. Experimental evaluation of our technique on a collection of 51 InTrade contracts representing the Democratic Party Nominee winning Electoral College Votes of a particular state shows that the approach outperforms traditional forecasting methods such as price and return regressions and can be used to extract meaningful business intelligence from raw price data

New York University Faculty Digital Archive

Recommended from our members

Summarizing and Searching Hidden-Web Databases Hierarchically Using Focused Probes

Author: Gravano Luis
Ipeirotis Panagiotis G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database contents. Unfortunately, web-accessible text databases do not generally export content summaries. In this paper, we present an algorithm to derive content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. The content summaries that result from this algorithm are efficient to derive and more accurate than those from previously proposed probing techniques for content-summary extraction. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases, automatically derived during probing, to produce accurate results even for imperfect content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases

Columbia University Academic Commons

Modeling Volatility in Prediction Markets

Author: Archak Nikolay
Ipeirotis Panagiotis G.
Publication venue
Publication date: 01/10/2008
Field of study

Nowadays, there is a significant experimental evidence of excellent ex-post predictive accuracy in certain types of prediction markets, such as markets for elections. This evidence shows that prediction markets are efficient mechanisms for aggregating information and are more accurate in forecasting events than traditional forecasting methods, such as polls. Interpretation of prediction market prices as probabilities has been extensively studied in the literature, however little attention so far has been given to understanding volatility of prediction market prices. In this paper, we present a model of a prediction market with a binary payoff on a competitive event involving two parties. In our model, each party has some underlying ``ability'' process that describes its ability to win and evolves as an Ito diffusion. We show that if the prediction market for this event is efficient and accurate, the price of the corresponding contract will also follow a diffusion and its instantaneous volatility is a particular function of the current claim price and its time to expiration. We generalize our results to competitive events involving more than two parties and show that volatilities of prediction market contracts for such events are again functions of the current claim prices and the time to expiration, as well as of several additional parameters (ternary correlations of the underlying Brownian motions). In the experimental section, we validate our model on a set of InTrade prediction markets and show that it is consistent with observed volatilities of contract returns and outperforms the well-known GARCH model in predicting future contract volatility from historical price data. To demonstrate the practical value of our model, we apply it to pricing options on prediction market contracts, such as those recently introduced by InTrade. Other potential applications of this model include detection of significant market moves and improving forecast standard errors

New York University Faculty Digital Archive

Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

Author: Ghose Anindya
Ipeirotis Panagiotis G.
Publication venue
Publication date: 01/01/2008
Field of study

With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we re-examine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes like the extent of their perceived usefulness. Our approach explores multiple aspects of review text, such as lexical, grammatical, semantic, and stylistic levels to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences have a negative effect on product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are considered more informative (or helpful) by the users. By using Random Forest based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. Reviews for products that have received widely fluctuating reviews, also have reviews of widely fluctuating helpfulness. In particular, we find that highly detailed and readable reviews can have low helpfulness votes in cases when users tend to vote negatively not because they disapprove of the review quality but rather to convey their disapproval of the review polarity. We examine the relative importance of the three broad feature categories: `reviewer-related' features, `review subjectivity' features, and `review readability' features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their socio-economic impact. Our results can have implications for judicious design of opinion forums

CiteSeerX

New York University Faculty Digital Archive

Demographics of Mechanical Turk

Author: Ipeirotis Panagiotis G.
Publication venue
Publication date: 10/03/2010
Field of study

We present the results of a survey that collected information about the demographics of participants on Amazon Mechanical Turk, together with information about their level of activity and motivation for working on Amazon Mechanical Turk. We find that approximately 50% of the workers come from the United States and 40% come from India. Country of origin tends to change the motivating reasons for workers to participate in the marketplace. Significantly more workers from India participate on Mechanical Turk because the online marketplace is a primary source of income, while in the US most workers consider Mechanical Turk a secondary source of income. While money is a primary motivating reason for workers to participate in the marketplace, workers also cite a variety of other motivating reasons, including entertainment and education

New York University Faculty Digital Archive

Analyzing the Amazon Mechanical Turk Marketplace

Author: Ipeirotis Panagiotis G.
Publication venue
Publication date: 11/09/2010
Field of study

Recommended from our members

QProber: A System for Automatic Classification of Hidden-Web Resources

Author: Gravano Luis
Ipeirotis Panagiotis G.
Sahami Mehran
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web "crawlers." Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. Here, we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases

Columbia University Academic Commons

The Dimensions of Reputation in Electronic Markets

Author: Ghose Anindya
Ipeirotis Panagiotis G.
Sundararajan Arun
Publication venue: Stern School of Business, New York University
Publication date: 01/01/2006
Field of study

We present a framework for identifying the different dimensions of online reputation and characterizing their influence on the pricing power of sellers. Our theory predicts that sellers with better recorded online reputation can successfully charge higher prices than competing sellers of identical products, and that their pricing power increases with their recorded level of experience. We develop and implement a new text mining technique that identities and quantitatively assesses dimensions of importance in reputation profiles, and use this technique to create a new data set containing detailed reputation profiles and prices for sellers in over 9,500 transactions for consumer software on Amazon.com's online secondary marketplace. The estimation of a set of econometric models on this data set validates the predictions of our theory, and further, ranks these dimensions of reputation based on their effect on measured seller value, identifying those that have the most significant impact on reputation. This paper is the first study that integrates econometric and text mining techniques toward a more complete analysis of the information captured by reputation systems, and it presents new evidence of the importance of their effective and judicious design.Information Systems Working Papers Serie

New York University Faculty Digital Archive

Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support

Author: Hristidis Vagelis
Hu Yuheng
Ipeirotis Panagiotis G.
Publication venue
Publication date: 21/09/2009
Field of study

Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, PubMed allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an Information Retrieval (IR) ranking function. The naive approach would be to submit a disjunctive query with all query keywords, retrieve the returned documents, and then re-rank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to a relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the PubMed database and a TREC dataset show that we achieve order of magnitude improvement compared to the current baseline approaches.Vagelis Hristidis was partly supported by NSF grant IIS-0811922 and DHS grant 2009-ST-062-000016. Panagiotis G.\ Ipeirotis was supported by the National Science Foundation under Grant No. IIS-0643846

New York University Faculty Digital Archive